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We address the problem of simulating efficiently from the poste- 
rior distribution over the parameters of a particular class of nonlinear 
regression models using a Langevin-Metropolis sampler. It is shown 
that as the number A'^ of parameters increases, the proposal variance 
must scale as N~^^^ in order to converge to a diffusion. This general- 
izes previous results of Roberts and Rosenthal [J. R. Stat. Soc. Ser. 
B Stat. Methodol. 60 (1998) 255-268] for the i.i.d. case, showing the 
robustness of their analysis. 

1. Introduction. The motivation for the study of the kind of models an- 
alyzed in the present paper is the following. We consider a sequence of non- 
linear regression models (indexed by N) relating a scalar response variable 
y with a vector of covariates z 



where h{-;x) is some function depending on a d-dimensional vector of pa- 
rameters X (weights) and e has a standard Gaussian distribution. If we take 
n independent measurements Y = (li, . . . , y„) on the response variable, cor- 
responding to the values (zi, . . . , Zn) for the covariates, and define the vector 
H with components Hk{x) = h{zk]x), k = 1, . . . ,n, we get the measurement 
equation 
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where e = (ei, . . . is a vector of i.i.d. standard Gaussians. 

Following the Bayesian approach we take the vector of weights {Xi, . . . , Xjsr) 
to be random with i.i.d. fi distributed components. Then the measurement 
equation induces the following posterior distribution (i.e., conditional on 
Y = y) on the weights 



^(y,H(x.)) - ^ E (H(x.),H(x,-)) (g)/ 

\j=l j,7 = l / 4=1 



7rjv((ix) =Cjv'exp( 5](y,H(xi)) - ^ E (H(x,), H(x,)) ) (g)^(dxi), 
(3) 

where (•, •) stands for the usual scalar product in M". 

These kind of distributions are known in the statistical mechanics setting 
as "mean field" models [12]. The study of such distributions with a general 
nonlinear H is made complicated by the interaction term which destroys 
the a priori independence among the weights. In Appendix A we recall that 
propagation of chaos holds for the sequence of distributions (3) as — > oo 
(Proposition 3, see also [1, 9]), which means that in the limit any finite col- 
lection of variables behaves as if the individual components had been drawn 
independently from a single probability measure vr. This is characterized by 



\og{d7T / dfi) (x) oc (^y — J H(i7r,H(x) 



Moreover, we prove a moderate deviations result (Proposition 5) which will 
be useful for the sequel. 

In the rest of the paper we shall analyze the behavior of the Metropolis- 
adjusted Langevin algorithm (MaLa) [16] for distributions of the type (3). 
In order to simplify our analysis we shall consider the simplest case in which 
n = 1 and the weights are one-dimensional. Moreover, we shall assume that 
IJ, has an everywhere positive density w.r.t. the Lebesgue measure so the 
measure (3) has in this case the following A^-dimensional posterior density 

/ N ^ N \ 

(4) TTNix) OC exp ^ U{xi) - ^ E H{xi)H{xj) , 

\ i=l i,j=i ' 



where 



V{x)=yH(x)^\og^{x) 



and the limiting probability measure vr on the real line has a positive density 
as well (called again vr to keep the notation simpler) with the property 

log-7r(x) oc J7(x) — i/(x) y H d-K =: ip{x). 

In the following X will always denote a random variable with density vr and 
expected values of measurable functions f{X) will be written as 7r(/(A)). 
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The MaLa for the above density is a Markovian algorithm implemented 
in the following way. In order to compute Xj^l given Xj^\ first generate 



(5) 



(N) 



xf^+aW + ^Vlog7rMixf\ 



where is a standard Gaussian on independent of Xj^\ The law of 
Yj^^ given Xj'^^ = x, thus, has the density 



qN{x,y) 
(6) oc exp 



exp 



1 
1 

2^ 



y-x- YVlog7rAr(x) 



N 



i=l 



2 N \2n 



The proposal Yj^^ is accepted or rejected according to the following rule: 



(7) 



X 



(N) 

i+1 



Y„ 



(N) 



if 6+1 < 



7TN{xf^)qNiXy"X 



X 



(N) 



1 



otherwise, 



where are i.i.d. C/[0, 1]. 

In order to make the algorithm efficient the parameter a has to scale with 
N . A thorough discussion of this problem is reported in the recent survey 
[15], to which the reader is referred for more details. In the i.i.d. case (if = 0), 
the optimal solution for the MaLa has been given by Roberts and Rosenthal 
[14] . Our main result is a generalization of theirs for sequences of densities 
of the type (4): if a is taken proportional to a suitable inverse power of the 
number of variables then the rescaled path of the algorithm converges weakly 
to a product of one-dimensional diffusions with the same stationary density 
7r(x). The choice of the proportionality factor only changes the (constant) 
speed at which the paths of the diffusions are travelled. 



Theorem 1 (Weak convergence of the MaLa). Assume: 

(HP) The functions H and U have hounded derivatives of all orders; more- 
over, H itself is bounded, whereas lim^.j.^^^U{x) = —oo. 

Let Xf^ = (X]^^'\...,X]^^'^) be the MaLa defined by (7), with xj^^ ~ 
tttv and = ^ jN^^'^ . The following weak convergence result holds in the 
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space D[0,T], 

'[tAri/3]' • • • '^[tAri/3 



(8) 

for any integer k, where {Zl -.i = 1,2, .. .} are independent copies of the pro- 
cess Zt which is the unique solution to the SDE 

(9) dZt = \{\og^y{Zt)dt + dBt, Zo~^, 

with V = v{£) := 2£'^^{—£^t/2), t being a constant depending on vr (explicitly 
given in Lemma 7 in Appendix Bj. Moreover, the acceptance probability 
converges as N ^ oo, 

lim P{X'^^1 = y}^^) = 2$(-^V/2) =: a{£). 



An implication of this result is that as N ^ oo, for any T > 

(10) ^E^(^r)-^i 

weakly, if g is bounded and continuous and depends only on k components. 
Now, by the propagation of chaos, when N is sufficiently large, the asymp- 
totic bias 

g{xi Xk)TTNixi xn) dxi--- dXN 

g{xi, Xk)7r{xi) dxi • • • 7r(a;fc) dx^ 



is small. On the other hand, by ergodicity of (9), when T is large enough the 
right-hand side of (10) will be close to / g{xi,. . . , Xk)TT{xi) ■ ■ ■TT{xk) dxi ■ ■ ■ dxk 
with arbitrarily high probability [see, e.g., [17], Theorem (53.1)]. Hence, (10) 
may be loosely interpreted as stating that the Monte Carlo estimate 

(11) j±9{xf^^\-,xf^'') 

i=i 

of / g{xi, . . . , Xk)T^N{xi, . . . ,xj\f) dxi ■ ■ - dxN requires a number of iterations / 
proportional to N^^^ . How large T must be depends on the mixing properties 
of the diffusion Z, but it is, however, clear that for any fixed value of T it 
is convenient to have v{t) as large as possible in order to enlarge as much 
as possible the integration window. We can give an analytic expression for 
the maximizer I of v{i), but this is, in practice, useless since it cannot be 
computed easily (except by Monte Carlo methods, which defeats somewhat 
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the purpose). Luckily, the functions v{i) and a{i) have the same form as 
in [14], even if the constant r is different in general. Hence, we can exploit 
the fact that a is a bijective function of i in order to maximize easily v 
as a function of a. Indeed, v{a) oc a{^^^{a/2)}'^/^ , up to a constant factor 
depending on r. Since this function has a unique maximum in a ~ 0.574, in 
practice it suffices to monitor the acceptance rate | J2j=i '^{Xj^l / Xj^^} 
of the MaLa and tune i until a{i) equals 0.574. 

As in the i.i.d. case, it is worth noticing the superiority of the MaLa over 
the random walk Metropolis (RWM) algorithm. In the RWM algorithm the 
proposal vector y(^) has zero mean and, in order to obtain convergence 
to a diffusion N^^^ has to be replaced by N, both in the scaling for the 
variance and for the time. The original result in [13] has been extended in 
[2] to Gibbs fields with no phase transition, and it could be proved for mean 
field models like (4) as well. As a consequence, (10) essentially holds with 
jyi/s replaced by A^, which implies that the required number of steps has 
the order A^ rather than N^^^. The only difference is that the function v{i) 
has to be replaced by some other function, which this time is maximized 
when the acceptance rate is roughly equal to 0.234. 

A final comment concerns the assumption made in Theorem 1 that the 
initial value Aq^'' is already distributed according to the target density ttat, 
which is clearly unrealistic. This means that, in practice, the partial sums 
in (10) do not start from 1, but typically from some large value to, which 
ensures that the effect of the initial value Xq^^ can be neglected. A deeper 
study of the scaling behavior of the MaLa and the RWM when started in 
the tails of the target density tt^ has been initiated in [3]. 



2. A quantitative central limit theorem for the log-acceptance ratio. A fun- 
damental step towards the proof of Theorem 1 is to establish a quantitative 
central limit theorem (CLT) for the log-acceptance ratio 

(12) Ga^N{X,W) - log — — — — , 

7rN{x)qN{x,Y^{x,W)) 

where x = (xi , . . . ,xn) is fixed, W = {Wi , . . . , Wjy) is a random vector having 
i.i.d. A(0, 1) components defined on some probability space {Q,J^,F) and 
Yfj{x,W) is the proposal vector given by 

(j2 / 1 ^ \ 

(13) y^,,(x, W) = Yi = Xi + aWi + — \U'{xi) - H' {x,)-J2H{xj) j , 

for i = 1, . . . ,N , with a = = j^j^, for some £ > 0. 
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Proposition 2 (CLT for the acceptance ratio). There exist measurable 
sets Fn C M^, with ttn{F^) = o{N^^) for any t>0, such that 



(14) lim iV^ sup sup 

N^oo xeFMueR 







for any /3 > sufficiently small, where r is some positive constant. 

Before starting the proof we set up a convenient notation. First, we shah 
denote by Ej\f empirical averages w.r.t. the vector {x,W,Y), that is, 

1 ^ 

(15) i^^/(x,H^,y) = -^/(x„w^„y,). 

i=l 

In order to shorten the notation even further the function / is allowed to 
contain empirical averages as arguments as well, in which case they have to 
be considered as constants. In particular, for 

(16) Vjv(t; x) = U{t) - H{t)EMH{x), 
we define 

1 ^ 

{ENi^N){x) = -^^^(xi;x) = £;^[/(x) - {ENH{x)f, 
1=1 

and we apply the same convention to empirical averages of derivatives 



i^f{t-x) = U^>'\t)-H^'^\t)ENH{x) 



and to their products. Finally, we use the shortened notation 

N 



(17) ENg{x)W' = ^Y.9i^^)^l 

i=l 

and 

1 ^ 

(18) Er,h{Y)W' = -Y.hiYi)Wl 

i=l 

Moreover, we will always use the same letter C for several constants appear- 
ing in the estimates. 

Proof of Proposition 2. By direct computation the first two deriva- 
tives of Go-,Ar(x,W) w.r.t. a vanish at o" = 0. Consequently, we have the 
Taylor expansion 

(19) G^,n{x, W) = J2a''gk,N{x, W) + j\a - uf ^Gu,n{^, W) du, 
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where gk,N{x^ = -^-^Gu,n{x, W){0) for k = 3, . . . , 6. For completeness 
the exphcit form of these functions is given in Lemma 6 in Appendix B. 
Setting a = IjN^I^ and standardizing as in (14), we have 

+ — = ]^^^3,7V(^, W) + ^^54,^(x, W) 

+ 



T^g5,Nix, W) + — [ge,N{x, + 



By using a a standard lemma on distribution functions ([11], Lemma 1.9, 
page 20) we obtain the following estimate: 



sup 



Pt 2 



+ — < n - $(n) 



(20) <svip\¥{AN<u)-<^{u)\+¥{\BN\>eN)+n\CN\>eN) 
+ n\DN\>eN)+n\lN\>eN)+ ^^"^ 



/27r 

where (en) is an arbitrary sequence of positive numbers to be chosen in the 
sequel. 

In Appendix B various lemmas are proven in order to estimate separately 
each term appearing on the right-hand side of (20). By Lemma 7, for any 
N and etv > 0, 

sup\F{An <u) -<l>{u)\ 



where is polynomial, r-^ is a vector of bounded measurable functions and 
h^- is a locally Lipschitz function vanishing at 

r2 = F3(7r(r3(X))). 

Denote by C3 the inverse of the local Lipschitz constant of h at t^. Therefore, 
for 

X G Fn,3{sn) = {x : lENTsix) - 7r(r3(X))| < CsSn}, 

it holds 

(22) sup\F{AN<u)-<^iu)\<c(^ + ^ + eN 
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provided en goes to zero. By Lemma 11, for any N and en > 0, 
(23) n\BN\>£N)< ^ 



(24) P(|C^|>e7v)< ^ 



(25) n\DN\>eN)<^, 

for X e nfc=4-^Ar,fc(eA')) where 

FnA^n) = {x : lENTkix) - 7r{rk{X))\ < Cfee^iV^/6-i|^ 

Tfc being a vector of functions for = 4, 5, 6, and Cfc, k = 4,5, 6, are suitably 
small constants. Furthermore, by Lemma 12, 

(26) n\lN\>eN)< ^ 



Finally, set F/v = f]f.^^FN^k{£N), and choose en = N~^/^. In order to es- 
timate 7ri\i{F^ f^{N~^^^)) we need to control deviations of empirical aver- 
ages from expected values under vr of the order N~'^^ , where as = ae = 1/9, 
as = 5/18 and 04 = 4/9. Since the latter is the largest, it is enough to apply 
Proposition 5 in Appendix A with A^r = N^/^^ , in which case N~^/'^\n = 
jY-4/9 gy consequence, 

k=3 

which is o{N"^) for any t > as claimed. 



Using the bounds (20), (22)-(26) we get that 



sup\¥{G„,Nix,W) < n) - $_,6,2/2,,6,2(n)| = 0(iV-^/y). ^ 

3. Proof of Theorem 1. Let / be any smooth function with compact 
support from to M. Define on / the discrete generator, 

A^,r,f{x) = nfixi^}) - fi^)\xr = x] 

(27) 

= E[(/(y.)-/(x))lAe^-^(-'^)], 
and the infinitesimal generator of the process (Z„(£)j), 

(28) A fix) = ^ E lU-A^) + (u'i^p) - H'ixp) J HdT^U^ix) 
p=i 
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By [7], Corollary 8.9, page 233, the weak convergence (8) holds, provided we 
exhibit measurable sets Fj\[ C such that 

(29) ^lirn^ ^(^[nIh] ^ foi' alH < T) = 1 
and 

(30) lim sup|iVi/%^-i/6^/(x)-A/(rE)|=0 

for any smooth f{x) = f{xi, . . . , Xk) with compact support. Notice that since 
Xq^^ ~ Trjv and ttat is stationary, 

^^InIh] ^ t<T)< \N^I^T\tij,{x : X i Fn). 

Thus, in order to ensure (29) it is enough to check that 7r]\i{F^) = o{N~^/^). 
By [18], Proposition 2.2, page 177, it is enough to prove (30), for k = 2, in 
order to get the convergence (8) for any integer k. 

For a fixed x S M.^ we expand Aa,Nf{x) in powers of a, which is obtained 
by recalling that Y^^i is defined in (13), 

A,,;v/(x)=E[(/(y,,i,y,,2)-/(xi,X2))lAe^-.^("'^)] 



i=l 

2 

(31) + Y/x.(f/'(xi) - H'{xi)EMH{x)) ] + a^WiW2U,x, 



X E(l Ae^--'^(^'^)|VFi,M^2) 



+ a^rN{cr,x), 



where partial derivatives of / are always evaluated at (xi,X2) if not specified 
otherwise, and 

rN{cT,x) = |e| (^E[/x„x„.,(y^,i,l^^,2) 

X (VF, + a{U\xi) - H'{xi)ENH{x))f 
+ 3/.„..(y~ „y~2) • {U\xi)-H\xi)ENH[x)) 
X {Wi + d{U'{x,) - H'{x^)EnH{x))) 
+ 3^ [^,,.,,.^ {Wi + a{U'{xi) - H'{xi)ENH{x))f 
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X {Wj + a{U'{xj) - H'{xj)ENH{x))) 

+ fx.,x, (Wi + a{U'{xi) - H'{xi)ENH{x))) 

X {U'{xi)-H'{xi)ENH{x))\^ 
X 1 Ae^-^(^'^)|, 

where < cr < cj. By assumption (HP), plugging in o" = = ^N~^^^, the 
remainder r^{a^x) is uniformly bounded in and x. 

Next, observe that if T{u) is an absolutely continuous function of the real 
variable u, then 

1 A 6^(1) = 1 A 6^(0) + l{r(«)<o}r'(n)er(«) du. 

Now we apply this formula to the function f (n) = Ga,N{x, uWi,uW2,W^'^^), 
where W^'^^ = (W3, . . . , Wn), and take conditional expectations w.r.t. {Wi, W2) 

E(l A e^--'^^^'^) 11^1,14^2) 

(32) =E(lAe^-'^(^'^)|W^i=0,W2 = 0) 

+ J2W^ / E(l^p(„)<o^G«^(x,nW^i,nT^2,W^('))e^(")|T^i,VF2)d7x, 

where G^^\ denotes the partial derivative of Gct^n{x,'Wi,W2,w^'^^) w.r.t. the 
variable Wi. 

We now substitute (32) into the expression (31) so we obtain the following 
expression: 

2 

^A^,Nf{x) = ^ ^ [/x.,x,E(l A eG..-(-'^) I VTi = 0,W2 = 0) 

%=1 

(33) + UAU\x,) - H'ix,)E^H{x))E{l A e^-^(^'^)) 
+ RN{cr,x), 

where 
RN{cr,x) 

2 r ^1 

= a~^J2f-^^ t^^^^^^^^^G%{x,uWl,uW2M'^)e^^''^du 
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i=l 



Wt 



^2 / t^t^^^^^^G%{x,uWi,uW2M''^V^''^du 



+ E 



+ crrAr((T, x). 



Now we concentrate on the term in the above expression, since the 
others are more easily controhed with similar arguments. First, bound \fxi\ 
with a constant, then we are left to bound for i = 1,2, 



(34) 



(T 



<-^(wf sup \G^^N^\x,uWuuW2,W^'''^)\\. 

O" \ 0<M<1 ' / 



Let us write explicitly 



a 



{U'{Yi) - U'ixi) - H'{Yi)ENH{Y) + H' {xi)ENH{x)) 



a 
T 



N 



Wi{U"{Yi) - H"{Y,)EnH{Y)) - H'{Y,)- ^ WkH'{Yk) 



k=l 



+ -{U"{Yi) - H"{Y,)EmH{Y) - H'{Y,)EnH'{Y)), 



where we have written Yi for Y^^j. Using (HP), we can rewrite the right-hand 
side of (34) as 

-e(w^ sup \G^^N^'\x,uWi,uW2,W^''^) 

0" V 0<M<1 

= Ie(wI sup \U'{Yi{u))-U'{xi) 

^ \ 0<u<l 

- B'{Ji(u))ENliiJ(u)) + H'{x^)EnH{x)\ ] + o(l 
where Yk{u) = Yk + au J2i=i ^kiWi, k = 1, . . . ,N . We have now 
eIw^eI sup \U'{Yi{u))-U'{xi) 

\ \0<u<l 
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- H'{Y,{u))EnH{Y{u)) + H'{xi)EMH{x) 

(35) <¥.(wf sup \U'{Yi{u)) - U'{xi)\ 

\ 0<-u<l 

+ ¥.(wf sup \H'{xi)-H'{Y,{u))\ENH{2 

\ 0<u<l 

+ ¥.(wf sup {\H'{Yi{u))\EN\H{Y{u))-H{x)\) 

\ 0<u<l 

Observe that, when T is either [/', H' or H and i = 1, 2, we can write, using 
the fundamental theorem of calculus. 



T{Yi{u))-T{x^) 



T 



2 

x, + '^{U'{xi)-H'{x^)ENH{x)) 



T{xi) + T{Y,{u)) 



2 

+ ^{U'{xi) - H'{xi)ENH{x)) 



-T 

2 

'^{U'{xi)-H'{x^)ENH{x)) 



X T' (^x, + "^{U'ixi) - H'{x,)EnH{x))J dv 

+ aW^ £t'(^x, + y (C/'(xi) - H'{x,)EnH{x)) + saP^i) ds. 

By bounding the derivative of T and substituting ctn = iN~^^^, we have 
sup \T{Y,{u)) - T{xi)\ < + \W,\). 

0<u<l 

By substituting this bound into (35) and, subsequently in (34), the right- 
hand side is bounded by 0(iV~^/^) uniformly over x. Similar arguments 
allow us to conclude that R]y{a,x) ^ as N ^ oo, uniformly over x as well. 

Now let be a Gaussian random variable with mean — £^t^/2 and vari- 
ance i^r'^. It is immediately seen that E(l A e^) = 2^{-(.^t /2). By an inte- 
gration by parts we have 

|E(lAe^-^(^'^))-IE(lAe^)| 

< Csup|P(G<^,7v(x,T4^) < li) - ^>_^6^2/2,£6^2(n)|, 

which goes to zero uniformly for x G by Lemma 7. Moreover, 
|E(1 A e^'^-'^(^''^)|H^i = 0, = 0) - E(l A e^-.'^^^'^))] 
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< E\G^,n{x,0, 0, W^'^) - G,,n{x, W)\ 

< Ve / \WiG^,N^'\x,uWi,uW2M'^)\du, 

^=l 

and by the same argument as before, the right-hand side goes to zero uni- 
formly over X. Finally, we have 

\N'/''A^,r^f{x)-Af{x)\ 

= i^\afA^,Nfix)-r''Af{x)\ 

<I^"'E[l/x„x,(^i,^2)||IE(l A e^-^(^'^)|t^i = 0,^2 = 0) 

i=l 

-E(l Ae^)| 

+ |/x,(xi,a;2)||[/'(x,)||E(lAe«-^(^'^))-E(lAe^)| 

+ \f,^{xi,X2)\\H'iXi)\ 

X (\EnH{x)\\E{1 a e^-.'^^^'^)) - E(l A e^)| 

+ \ENH{x)-7T{H{X))\\E{lAe-^)\)' 

+ \RN{x,f)\. 

Next define F^v = i^7vn{x : \EnH{x) - tt{H{X)) \ < A^-V9}.^By using Propo- 
sition 5 in Appendix A it is immediately verified that 'Kj\[{F^) = o{N~^) for 
any t>0. The proof is complete since the right-hand side of the last expres- 
sion goes to zero uniformly on F/y. 

APPENDIX A. 

In this appendix we discuss the asymptotic behavior of sequences of dis- 
tributions TTjsf defined in (3) for a general measurable function H. For ease 
of notation we drop from now on boldfaces used to indicate n-dimensional 
vectors. First let us introduce the exponential family of probability measures 
on generated by /x and H, which is defined by 

where K (9) = log J 6^^'^'^='^^ fi{dx) is the cumulant generating function of H 
under /i. We assume that K is finite only in an open set of M."' and that 
no hyperplane of M" contains H{x) /x-almost surely (in the case n = 1 this 
is equivalent to assume that H is nonconstant). Moreover, in the paper we 
assumed that H is bounded so K is defined on the whole space. 
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Consider now the strictly convex function J (9) = ^\\9 - y\\^ + K {6) , where 
K is extended to the complement of 6 by setting its value equal to +oo. This 
function has a unique minimum 9^, = 9^:{y) in M" (as it is strictly convex and 
lower semicontinuous with compact level sets), that is the unique solution 
of the equation 

(36) 9 + VK{e) = y, 

which implies, by the properties of exponential families, that 



(37) e^=y- j Hdn. 

We can now state the following: 

Proposition 3 (Propagation of chaos). Whenever /:(] 
a bounded measurable local function {i.e., it depends only on a finite number 
of components), then 



pd\oo 



IS 



lim ffd7rM=f /dvr®"^, 

N^ooJ J 



where vr = fig^ . 

Proof. We can easily bound the Kullback-Leibler divergence 

D{tTn\\tt'^^)= flogidTTN/dTT^^JdTTN. 

ing ( 



In fact, by using (37) and setting H{x) = H{x) — J H dn, 



N 



1 



N 



: logCj^i + NKi9,) + J2{m - 9,,H{xi)) - ^ E {H{x,),H{xj)) 

2 



logC7-i + iVJf(0,) + 



i=l 

N 



Hdir 



N 

y 



Hdn 



\i,j=l 



i=l 



N 
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where 



= logC]^'+NJ{9,) 



N 



HdiT 



log / expl -- 



N 



1=1 



N 



)TT{dXi), 



i=l 



and, therefore, by the CLT, the right-hand side of the above expression 
converges to 

-logii;(exp(-i|z2|)), 

where Z is a zero mean Gaussian vector. Hence, it is bounded uniformly in 
N by some constant Mq. By consequence -D(7rAr||7r'^^) < Mq. It follows that 
if we denote by Tr^^k the marginal of ttat for the first k components, then an 
inequality of Csiszar [5] equation (2.11), page 772, yields 

and now the stated convergence follows by [4], Lemma 3.1. □ 

In the forthcoming Proposition 5, we shall need the following technical 
lemma. 

Lemma 4. For any symmetric nonnegative definite matrix A of order s, 
the convex conjugate of 6 ^ ^{9, AO) is given by 

M*(z) = { i/zGRan^, 
\ +00, otherwise, 

where A~ is the pseudo-inverse of A. As a consequence, the origin is the 
unique minimizer of M* . 

Proof. Let A = U^LU, with L a diagonal matrix with the diagonal 
elements equal to the eigenvalues (Aj) of A. Then A~ = U^L~U, where L~ 
is the diagonal matrix with diagonal elements equal to the reciprocal of the 
eigenvalues (if positive) of A and zero otherwise. By definition. 



M*{z) = sup((z, 9) - ^{9, A9)) = sup . 



vi=l 



i=l 



where v = Uz and w = U9. If there exists io such that Ajg = and v^^ ^ 
(which happens if and only if z ^ Ran A), it is immediately seen that 
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M*{z) = +00. Otherwise, the function between round brackets has a max- 
imum Wi = for i such that Aj > 0, Wi = otherwise. Finally, it is easily 
seen that 



i:Xi>0 * 



for z € Ran A. □ 



Proposition 5 (Moderate deviations). // the sequence {Xn} is such 
that Atv 00 but \%/N — > 0, then for any hounded measurable function 



N 



i=l 



gdn 



> 



N 



where c> is a constant and it = fig^. 



Proof. Define g{xi) = g{xi) — J gdn, H{xi) = H{xi) — J H dn and 

N 



i=l 



{ZN,YN) = {XN^r^Y.^g{xi),H{xi)). 

Now it is easy to compute (see, e.g., [6]) 

A(0,V)= lim ^\og U^vXl{{e,ZN) + {^,YM))dTi' 

N^oo Air J 



1 



((^,^),S(0,^)), 



where S is the covariance matrix of (jj{x),H(x)) under vr. By applying the 
Gartner-Ellis theorem and Lemma 4, we prove that (Z]\f,Yisf) satisfies un- 
der TT an LDP with speed and rate function 

\ +00, otherwise. 

We want to prove the same result for the sequence Zn under ttn- Decompose 
S into blocks as 



5^11 : 5^12 




^ Jgg'dn 


JgH'dn^ 


\II21 : ^22/ 




[jHg'dTT 


J HH^ dn J 



OPTIMAL SCALING OF MALA 



17 



and write 



log J exp{X%{9,ZN)}dTTN. 



Next apply Varadhan's lemma ([6], Theorem 4.3.1, page 137) to the contin- 
uous function ip{z,y) = {9,z) — which satisfies the moment condition 



lim -^log fexp{aX%ip{ZN,YN))dTT'^^ 

< lim T-n-log / exp(aA^(0, Ztv)) dTT*^ 
Af^oo Aif J 



ji™ T^l°g / exp('^(a(9,c/(xi)))7r(dxi) 

Af-+oo AjY J \\/N / 



lim ^ 1 + ^^(0, Siie)+oM^ 



for any constant a. Since Cn is bounded in N, we obtain 
A(0):= lim -^Ajv(^) 



lim -^log / expA^99(Z;v,>"jv)c^vr®^ 

Af— >oo A^ J 

sup{(y9(2;,y) - J(2:,y)}. 



z,y 



In order to maximize the right-hand side, write (z,y) as 'E{u,v), without 
loss of generality since J is equal to -|-cxd out of the range of S. Now 

sup{ip{z,y) - J{z,y)} = sup{(6', Sun Si2u) - ^\\i:2iu + T.22v\\'^ 



z,y 



- i((n, Siiu) + {v, T,22v) + 2{u, 212^))}. 

The function to be maximized is concave in {u,v) and it is immediately 
checked that {—9,{I+ Ti22)~^'^2iff) is a stationary point. Substituting this 
back into the above expression, we finally arrive at 

1(9) = ^{6,39), 

where B = Sn — Si2(/ + 5]22)~^S2i. In order to apply the Gartner-Ellis 
theorem, we need only to check that B is nonnegative definite and apply 



18 



L. A. BREYER, M. PICCIONI AND S. SCARLATTI 



Lemma 4. Set A = Si25]22- Since Ker[S22] = Ker[Si2], we have S12 = ATj22- 
As a consequence, 

YaxT,[g{xi) - AH{xi)] = Sn - Si2S225^2i > 0. 

Now consider the difference 

D = i;i2S22^21 — 5]l2(/ + S22) ""^1121 = 5]i2(S22 ~ (-^ + ^22) ^)S21, 

and notice that the matrix between round brackets is nonnegative definite on 
RanS22- But since Ran[S2i] C RanE22 (as a consequence of the inclusion 
KerS22 C KerSi2), D is nonnegative definite, and, hence, so is B. The 
exphcit estimate in Proposition 5 fohows by taking c = inf{A*(z) : z ^ Bi} > 
0, where Bi is the unit sphere in M™. □ 



The results of this appendix can be directly applied to the sequence of 
densities ttat defined in (4) by setting m = and ii{dx) = exp{f7 (x)} dx. 



APPENDIX B. 

Let T) be the set of monomials in the derivatives of H and U . By assump- 
tion (HP) functions in T) are bounded. The following lemma is the result of 
a tedious but a straightforward computation, whose details are omitted. 



Lemma 6. For /i = 0, 1, 2, . . . , 

jh h+2 

(38) ^G,,7v(x, W)=nY, a^PkiENPe{x)MYa)W'^;i = 1, . . . , m,.), 

fc=0 

for some integers ruk, where Pk is a polynomial and pi,ipi gV. In particular, 
the derivatives qu^n {x,W) = ^^GuAx,W){Q), fork = 3,...,6, have the 
following explicit form: 

N 

93,n{x, W) = - {ENiSi^'^^l^'^W + ^'I^W') 
(39) 

- 3EN{H'W)EN{H'ij'^) - 3En{H"W^)En{H'W)), 

9i,N = (f^(3V'^V^v' + ^^^nW^ + Q^^n^'nW^ + i^'N'W) 

(40) - 3{{ENH'^'j^f + 2{EnH"W^){EnH'^'j^) 

+ {ENH"W^f} + 64,N), 

(41) 95,N=N55,N, 
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and 

N 

^•^'^-"1440 

(42) + eOV^rV'^^' + 4?Ar'M^'') 

- [90{ENH'i^'j^^';^){ENH'^P'j^) + 180(^7vii"V^')(^7vi^V^v) 
+ 90(S^V^ii"Ty2)(^^i/V^) 
+ 90iENH'iP'l,^P'j^){ENH"W^) 

+ im{ENH"i;'^^){ENH'^'^) + 9Q{En^%H'W'^){EnH"W'^) 
+ ?,m{ENH"'i^'j^W^){ENH'i^'j^) 

+ 180(£;7V-H'"VF2)(^jv-H''V7v) 
+ lSQ{ENH"W^){ENH"ij%W'^) 

+ 60(Sjv^""W^^)(^7V^V^v) 

+ 2,m{ENH"W'^){ENH"'i;'^W'^) 

+ QQ{EnH""W^){EnH"W'^)] 
+ [45(Sjv^'^)(^7V^V^v)^ + 90{ENH'^){ENH"W^){ENH'ij'N) 

+ A5{ENH'^){ENH"W^f] 

where S4^n,6^^n o,nd Sq^n sums of monomials in empirical averages of 
the type (15) and (18) and each of them has at least a factor with an odd 
value of I . 

Lemma 7. Set 

= T^,{97Tir'iX)i^'\X)) + 18vr(^'(X)<(X)<'(X)) + 157r(^'"2(^)) 
- 187T{H"{X) + H'{X)i^'{X))7T{H'{X){r'{X) + i;'{X)r{X))) 

(43) 

+ 97r{H'^{X)){n{H"{X)) + 7r{H' {X)i^' {X))f} 

= :Fs{n{rs{X))) 

for some polynomial F^ and some vector r^, with components in D. 
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Then for any N and > 0, 



sup 
(44) 



<u] - $(u) 



where /it-(x) = |1 V -^Ijl — is a continuous Lipschitz function vanishing 



at T^. 



Proof. Let us define 



- ^En{H'^'j^{x))En{H'{x)W) - ?,EnH"{x)En{H'{x)W)] 

and 



Yn = ^fEM{H"ix)iW^ - 1))En{H'{x)W). 
From the expression of g^^N given in (39), we find tliat 

-y^g3,N{x, W)=Xn + Yn. 
V A* 

Tlie term Yn has zero mean, and we bound its variance as follows: 
(45) EYn' = y^J:h"\x,)H'\x,)E{{W^ - l)^Wf ) < ^. 



The expression Xn is a sum of independent random variables, whose mean 

.2 

AT 



under the measure P is zero. We compute its variance directly as follows: 



4 = -^{EN{9i^%\x)i;N'Hx) + l8i;'^{x)Mx)ilj%{x) + lbij%'{x)) 

-18EN[H'{x)ij'N{x) + H"ix)] 

X EN[^p'N{x)i^N{x)H'(x) + iP';:,{x)H'{x)] 
+ 9ENH'\x){ENH"{x)f 
+ ISEnH" {x)En{H' {x)^'^{x))EnH''^{x) 
+ 9ENH'\x){EN{H'{x)tP'^{x))f 

(46) + ^ [-mEN{i^Ux)Mx)H"{x)H'{x)) 
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-48En{^P'n{x)H"{x)H'{x)) 

- l2EN{i>'N{xWN{x)H'{x)H"{x)) 

-ASEn{^P%{x)H"{x)H'{x)) 

+ 36^^ {H' {x)i)'j^{x)) En (H" {x)H'\x)) 

+ 18En{H'^{x)H"{x))EnH"{x) + 36En{H"^{x)H'^{x)) 



+ 72^En{H"\x)H'\x))}. 



Ar2 

By inserting into the above terms the exphcit formula for ipN given in (16), 
expanding the products and rearranging terms, we get the representation 
rff = F3{E]\frs{x)). By replacing the vector of empirical averages E]\fr3{x) 
with that of expected values w.r.t. vr, the expression (43) is obtained. 

Next, setting u = v^, we obtain 



sup 



< sup 



< sup 

V 



< - $(u) 



\ TN 

JXn 

V TAT 



<vj- ^{v) 
<v] - ^(v) 



+ sup 



+ 1 V 



$ V 



TN 



TN_ 
T 



1 



^) 

TN/ 



where the last line has been obtained by a straightforward Lipschitz esti- 
mate. 

By using the formula given in [11], Lemma 1.9, page 20, again and the 
above estimate 



sup|P(^7V<'u)-$(n)| 

u 

^( Xn + Yn 
= sup r 

u \ T 

<suppf^ 

u \TN 



+ ¥{\YN\>eNT) + 



£N 



and by means of Esseen's inequality ([11], Theorem 5.4, page 149) for Xn/tn, 
Chebyshev's inequality and the estimate (45) for Yn, we arrive at 

sup \F{ An <u) -^{u)\ 

u 

1 C 



< 



{EN\^'l,ix)^P'Nix)\' + EN\i^%ix)\' 
+ EN\H'{x)f{EN{\H'{x)i;'N{x)f + \H"{x)f))} 
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+ 1 V 

from which the estimate (44) is obtained. □ 



T 



tnJ 



Remark 8. It is worth noting that when H = 0, that is, the target 
distribution has independent components, an easy integration by parts yields 

= j^,{97T{r\X)i^'\X)) + 187ri^p'{X)riX)r'iX)) + 157r(V''"'(X))} 

which coincides with the constant appearing in the paper [14] . 

Lemma 9. Let F : M™ be a polynomial and r/^ : ^ M, /i = 1, . . . , m, 
be of the form rh{xi,Wi) = bh{xi)W^'^ , where belongs to V. Define the 
vector ¥.r with the components in T> by (Er)/j(a;j) = E{r/j(xj, VFj)}. Then for 
any < 7 < 1/2 and e>0, 

F[N''\F{ENr{x, W)) - F(^((Er)(X)))| > e] < ^ 



iV(i-27)e2 
holds for all x G F]y{£), where 

FN{e) = {x : \EN{Er){x) - 7r((Er)(X))| < eN-y2K}, 

and K is a local Lipschitz constant for F in a neighborhood of the point 
vr((Er)(X)). 

Proof. Let us notice that, when x E Ffyf{e), we have 

F(iV^|F(E7vr(x, W)) - F(^((Er)(X)))| > e) 

< F{N'^\F{EN(Er){x)) - F{Env{x, W))\ > e/2). 

Let us consider a generic monomial appearing in F{vi, . . . , Vm), which will 
be of the form H/iLi v'^'^ . By simple algebraic manipulations, 

mm ™ / \ 

n -![<''= E n ? (-/^ - uhY'^'--'''- 



h=l h=l {h,...,l,n):h-\ \-l,n>Oh=l 



h 



Now substitute the empirical average Ej^irh{x,W) into Vh and its centering 
EN{Kr)h{x) into u^. Denoting by s = r — Er, the above expression becomes 

m , ^ 



(h,...,U):li+-+ln,>Oh=l 
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We proceed to bound the second moment of each term of the above sum in 
the fohowing way. The term |Er| is bounded by a constant so we are left to 
bound the second moment 

Mh„...,h, ix) = niE^Sh, {x, W)f' ■ ■ ■ {ENSh, {x, 

where Sh{xi,Wi) = hh{x^)z[^^ with zf ^ = VF^"^ - EVFi"\ By using next 
Lemma 10, we finally get the bound 

Mh,,...,h,{x)<^. 

The proof is complete by an application of Chebyshev's inequality. □ 

Lemma 10. Let (Zj : z = 1, . . . , N) he i.i.d. centered r- dimensional ran- 
dom vectors. For any j = 1, . . . ,r define vj^^^ = h^^\xi)z\^\ Then for any 
aj > 0, j = 1, . . . , r, such that X]j=i Q^j = k, it holds 

<n(^E>-») ) 

(47) 

m=l'' \V\=m\ h^=\ / V h„=l / 

where h^^(x) =Y.^^^j^^\}P'^(x)Z^^\ and the sum is taken over partitions 
V = {Ai, . . . , Am} of the set of repeated indices / = {l,...,l,2,...,2,...,r, 
(where "1" is repeated 2ai times, "r" is repeated 2ar times) such that 
each As contains at least two elements of I. 



Proof. Begin by writing 

E 



■j=l\ i=l 



1 



N N N N 



(48) =^E-EE-E ^(y!i' -yS!- • • • 

«1 = 1 lfe = lSl=l Sfc = l 

X ...y(i) ...yW ...yM 

^1 Sa-^ ^aiA hc^r—l 

A summand in the last expression is zero as soon as there exists an index 
{ii,...,ik, si,...,Sfc) whose value is not repeated by another. This follows 

by the independence and zero mean property of the Y^'^\ Another way of 
rearranging this sum is therefore as follows: partition the set / of the upper 
indices of the formula (48) into a finite union I = Ai L) ■ ■ ■ U Am, where 
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\As\ > 2 for each s. We write Y-'^'' =njeAfc^^"'^ to simplify notation. Then 
the sum on the left-hand side is bounded above in absolute value by 

k N N 

m E E E • 

m=l \'P\=mhi=l 



=1 



/liT^/ifc if k^ti 

Since the sum is over nonrepeating indices /ii, . . . , /im, we have, by indepen- 
dence, ^Y^^ \ • • • \Y^^ \ = 6^i(x/ij) • • • Now the summand in (49) 
is positive, so we can bound the sum from above by a sum over all (possibly 
repeating) indices hi, ... , hm, and after rearranging the sum, we obtain (47). 

□ 



C 



'TEN < 



c 



N 



'ten < 



N 

c 



Lemma 11. It holds that 

(50) n\BM > ej,) =p( '"^'^,"f^' > r V.^ ) < 

(51) n\CM>e.) = r{\^^^^^>. 

(52) F(|I^.|>..)=f(|^^^^^ + ^ >. 

for X G -FAr,fe(e7v); where 

FnM^n) = [x : lEMEvkix) - 7r(Erfc(X))| < ^^3-fc^fc/6-i|^ 

for k = 4,5,6, where K is the smallest of the local Lipschitz constants for 
Fk atTT{Erk{X)), fork = 4,5,6. 

Proof. By Lemma 6 we have gk,N{x, W) = NFk{ENTk{x, W)) for k = 
4, 5, 6. The vectors and polynomials are of the type required by 
Lemma 9. In order to compute i<fc(7r(Er/fc(X))) for A; = 4, 5,6 we need to 
replace in (40)-(42), of Lemma 6 the empirical averages with expectations 
with respect to vr x P. By a straightforward computation, 

F4(7r(Er4(X))) = -^{[3ii;«(X)V''2(X))+3i?(V'''(X)) 

+ 6E{i^"'{X)^'{X)) + 2,E{ij""{X))] 

-{E[H'W{X) + H"{X)]f} 

{e'^ijyix) dx-icj (e'^Hyix) dx 



j_ 

"24 



3c 



+ 00 
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since X has the density -k^x) = ce^^^^ with c = e~^^^*\ Since by assump- 
tion (HP) both (e'^V")'(3;) and e'^(^)i?'(a;) are of the form /(a;)e^(^) with / 
bounded and ipix) — > — oo as |x| +00, the right-hand side of the previous 
expression is zero. 

Next i<5(-7r(Er5(X))) = 0, since each monomial in rs contains at least one 
factor which is an odd power of W, hence, it has mean zero. Finally, 

Fe{Tr{Ere{X))) 

= -ji^{i5Eir'iX)i^'\X)) + 60Eii;"'iX)i;''iX)) 

+ 270E{^"'{X)i^"{X)il^'{X)) + 135^«'2(X)) 
+ 180E{i^""{X)tP'\X)) + 180S«"(X)V"(X)) 
+ l80E{^""'{X)i;'{X)) + 60E{i;"""{X)) 

- 90[E{H"{X) + H'{X)tP'{X)) 

xEiH'ix)ir'ix)+i^'ix)rixm 

- 90{2E{H"{X)i;'^{X)) + 2E{H" (X)4^" (X)) 

+ AE{H"'{X)ij'iX)) + 2EiH""iX))] 
X E{H" {X) + H' {X)il,' {X)) 
+ AhE{H'^{X)){E{H"{X)) + E{H\X)i^'{X))f] 

= -^^[{AhE{r\XW{Xf) 

+ mE{^'{X)i,"{X)^"'{X)) + 75E{^"'\X))) 

- 90E{H"{X) + H'{X)i;'{X)) 

+ EiH'iX)ir'iX)+^'{X)riX))) 

+ AbE{H'^{X)){E{H"{X))+E{H'{X)iP'{X))f] 

- ^{E{r{xw\x))+m'^'{x)i;"{x)r{x)) 

+ E{^"'\X)) + 2,E{ij""{X)^'\X)) 

+ 3E(V^""(XX(X)) + 3^«'"(X)^'(X)) 

+ ^,{[E{H"{X)^l^'\X)) + E{H"{X)i^"{X)) 
+ 2E{H"' {X)il^' {X)) + E{H""{X))] 
x[E{H"{X)+H'{X)i^'{X))]}, 
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and this simplifies to 



T 

Y 



because the first term in curly braces equals — ^ by (43), the second term 
is proportional to 

E(i>"'{X)i^'''^{X) + 3i>'{X)i^"{X)i>"'{X) + ij"'\X)3E{i>""{X)i;"{X)) 
+ 3i>""{X)i>'\X) + 3<"'(X)V'(X) + 



+00 



{e^i^"')"'dx = 0, 



and the third term in curly braces contains the multiplicative factor 

/+00 
{e^H'ydx = 0. 
-oo 

The last two displays equal zero by the same argument used before. There- 
fore, 

\g4^N{x,W)\ 



Ar2/3 ^^-'-^) 

= F{N^/^\Fi{ENn{x,W))\ >r^TeN), 
\g5,N\ix,W) 

> I TEN 



iV5/6 



g6,N{x,W) T 



+ 



N ■ 2 , 
= F{\Fe{ENre{x, W)) - Fe{Ere{X))\ > rVe^) 
so that the stated estimates follow directly from the previous lemma. □ 

Lemma 12. For aN=i/N^^^, it holds 

(53) 



1 /""^^ 

(cjAT - uf-^Gu,N{x,W)du 



6! 







< 



EN 



_c 



Proof. By Markov's inequality and Lemma 6, we have 

N . d"^ 1 

(aN-u) -^Gu,N{x,W)du >eN 

1 



6! 







< 



6!eAr 



E 







(fjAT-u) -^Gu,N{x,W)du 
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1 /"""^ fi 
blENJO 



^Gu,n{x,W) 



du 



1 f"'^ R 

oleTv Jo 

1 /-criV 



fc=o 



du 



< 



6\eN 



/ {uN - ufN ^ u^^Pk{ENPi{x)^t{Yu)W■,^ = 1, . . . , m)| du, 
•^0 fc=o 
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